| Number of Variables | 12 |
|---|---|
| Number of Rows | 31762 |
| Missing Cells | 0 |
| Missing Cells (%) | 0.0% |
| Duplicate Rows | 11203 |
| Duplicate Rows (%) | 35.3% |
| Total Size in Memory | 2.9 MB |
| Average Row Size in Memory | 96.0 B |
| Variable Types |
|
| jaro_distance is skewed | Skewed |
|---|---|
| jaro_winkler_distance is skewed | Skewed |
| overlap_coefficient_distance is skewed | Skewed |
| soft_tfidf_distance is skewed | Skewed |
| partial_ration_distance is skewed | Skewed |
| Dataset has 11203 (35.27%) duplicate rows | Duplicates |
| levenshtain_distance has 2424 (7.63%) zeros | Zeros |
| needleman_wunsch_distance has 2424 (7.63%) zeros | Zeros |
| affine_gap_distance has 2424 (7.63%) zeros | Zeros |
| smith_waterman_distance has 2424 (7.63%) zeros | Zeros |
| jaro_winkler_distance has 21670 (68.23%) zeros | Zeros |
|---|---|
| overlap_coefficient_distance has 9312 (29.32%) zeros | Zeros |
| generalized_jaccard_distance has 2524 (7.95%) zeros | Zeros |
| tfidf_distance has 2426 (7.64%) zeros | Zeros |
| partial_ration_distance has 4888 (15.39%) zeros | Zeros |
| bag_distance_distance has 2428 (7.64%) zeros | Zeros |
numerical
| Approximate Distinct Count | 1657 |
|---|---|
| Approximate Unique (%) | 5.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Memory Size | 508192 |
| Mean | 0.5174 |
| Minimum | 0 |
| Maximum | 0.9362 |
| Zeros | 2424 |
| Zeros (%) | 7.6% |
| Negatives | 0 |
| Negatives (%) | 0.0% |
| Minimum | 0 |
|---|---|
| 5-th Percentile | 0 |
| Q1 | 0.4386 |
| Median | 0.5647 |
| Q3 | 0.6571 |
| 95-th Percentile | 0.7759 |
| Maximum | 0.9362 |
| Range | 0.9362 |
| IQR | 0.2185 |
| Mean | 0.5174 |
|---|---|
| Standard Deviation | 0.2073 |
| Variance | 0.04298 |
| Sum | 16432.5285 |
| Skewness | -1.1148 |
| Kurtosis | 0.794 |
| Coefficient of Variation | 0.4007 |
numerical
| Approximate Distinct Count | 3253 |
|---|---|
| Approximate Unique (%) | 10.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Memory Size | 508192 |
| Mean | 0.6796 |
| Minimum | 0 |
| Maximum | 1.3435 |
| Zeros | 2424 |
| Zeros (%) | 7.6% |
| Negatives | 0 |
| Negatives (%) | 0.0% |
| Minimum | 0 |
|---|---|
| 5-th Percentile | 0 |
| Q1 | 0.5556 |
| Median | 0.716 |
| Q3 | 0.8605 |
| 95-th Percentile | 1.0946 |
| Maximum | 1.3435 |
| Range | 1.3435 |
| IQR | 0.3049 |
| Mean | 0.6796 |
|---|---|
| Standard Deviation | 0.2867 |
| Variance | 0.08222 |
| Sum | 21584.7928 |
| Skewness | -0.7639 |
| Kurtosis | 0.4455 |
| Coefficient of Variation | 0.4219 |
numerical
| Approximate Distinct Count | 8946 |
|---|---|
| Approximate Unique (%) | 28.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Memory Size | 508192 |
| Mean | 0.5745 |
| Minimum | 0 |
| Maximum | 1.08 |
| Zeros | 2424 |
| Zeros (%) | 7.6% |
| Negatives | 0 |
| Negatives (%) | 0.0% |
| Minimum | 0 |
|---|---|
| 5-th Percentile | 0 |
| Q1 | 0.4768 |
| Median | 0.6158 |
| Q3 | 0.7281 |
| 95-th Percentile | 0.8902 |
| Maximum | 1.08 |
| Range | 1.08 |
| IQR | 0.2513 |
| Mean | 0.5745 |
|---|---|
| Standard Deviation | 0.2351 |
| Variance | 0.05529 |
| Sum | 18246.8173 |
| Skewness | -0.9479 |
| Kurtosis | 0.6316 |
| Coefficient of Variation | 0.4093 |
numerical
| Approximate Distinct Count | 1415 |
|---|---|
| Approximate Unique (%) | 4.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Memory Size | 508192 |
| Mean | 0.5856 |
| Minimum | 0 |
| Maximum | 0.913 |
| Zeros | 2424 |
| Zeros (%) | 7.6% |
| Negatives | 0 |
| Negatives (%) | 0.0% |
| Minimum | 0 |
|---|---|
| 5-th Percentile | 0 |
| Q1 | 0.5217 |
| Median | 0.6489 |
| Q3 | 0.7286 |
| 95-th Percentile | 0.8158 |
| Maximum | 0.913 |
| Range | 0.913 |
| IQR | 0.2068 |
| Mean | 0.5856 |
|---|---|
| Standard Deviation | 0.2183 |
| Variance | 0.04766 |
| Sum | 18599.7557 |
| Skewness | -1.4811 |
| Kurtosis | 1.5748 |
| Coefficient of Variation | 0.3728 |
numerical
| Approximate Distinct Count | 9710 |
|---|---|
| Approximate Unique (%) | 30.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Memory Size | 508192 |
| Mean | 0.9862 |
| Minimum | 0.9091 |
| Maximum | 0.9955 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negatives | 0 |
| Negatives (%) | 0.0% |
| Minimum | 0.9091 |
|---|---|
| 5-th Percentile | 0.973 |
| Q1 | 0.9848 |
| Median | 0.9881 |
| Q3 | 0.9904 |
| 95-th Percentile | 0.9929 |
| Maximum | 0.9955 |
| Range | 0.08639 |
| IQR | 0.005659 |
| Mean | 0.9862 |
|---|---|
| Standard Deviation | 0.007521 |
| Variance | 5.6571e-05 |
| Sum | 31323.6713 |
| Skewness | -3.5694 |
| Kurtosis | 21.3391 |
| Coefficient of Variation | 0.007627 |
numerical
| Approximate Distinct Count | 3675 |
|---|---|
| Approximate Unique (%) | 11.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Memory Size | 508192 |
| Mean | 0.107 |
| Minimum | 0 |
| Maximum | 0.5174 |
| Zeros | 21670 |
| Zeros (%) | 68.2% |
| Negatives | 0 |
| Negatives (%) | 0.0% |
| Minimum | 0 |
|---|---|
| 5-th Percentile | 0 |
| Q1 | 0 |
| Median | 0 |
| Q3 | 0.2862 |
| 95-th Percentile | 0.3963 |
| Maximum | 0.5174 |
| Range | 0.5174 |
| IQR | 0.2862 |
| Mean | 0.107 |
|---|---|
| Standard Deviation | 0.1605 |
| Variance | 0.02576 |
| Sum | 3398.1375 |
| Skewness | 0.9225 |
| Kurtosis | -0.9713 |
| Coefficient of Variation | 1.5001 |
numerical
| Approximate Distinct Count | 67 |
|---|---|
| Approximate Unique (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Memory Size | 508192 |
| Mean | 0.2511 |
| Minimum | 0 |
| Maximum | 0.9 |
| Zeros | 9312 |
| Zeros (%) | 29.3% |
| Negatives | 0 |
| Negatives (%) | 0.0% |
| Minimum | 0 |
|---|---|
| 5-th Percentile | 0 |
| Q1 | 0 |
| Median | 0.2222 |
| Q3 | 0.4286 |
| 95-th Percentile | 0.6 |
| Maximum | 0.9 |
| Range | 0.9 |
| IQR | 0.4286 |
| Mean | 0.2511 |
|---|---|
| Standard Deviation | 0.206 |
| Variance | 0.04242 |
| Sum | 7974.1946 |
| Skewness | 0.2437 |
| Kurtosis | -1.0661 |
| Coefficient of Variation | 0.8204 |
numerical
| Approximate Distinct Count | 152 |
|---|---|
| Approximate Unique (%) | 0.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Memory Size | 508192 |
| Mean | 0.5561 |
| Minimum | 0 |
| Maximum | 0.9524 |
| Zeros | 2524 |
| Zeros (%) | 7.9% |
| Negatives | 0 |
| Negatives (%) | 0.0% |
| Minimum | 0 |
|---|---|
| 5-th Percentile | 0 |
| Q1 | 0.5 |
| Median | 0.6154 |
| Q3 | 0.7059 |
| 95-th Percentile | 0.8 |
| Maximum | 0.9524 |
| Range | 0.9524 |
| IQR | 0.2059 |
| Mean | 0.5561 |
|---|---|
| Standard Deviation | 0.22 |
| Variance | 0.04839 |
| Sum | 17662.9177 |
| Skewness | -1.2735 |
| Kurtosis | 0.9169 |
| Coefficient of Variation | 0.3956 |
numerical
| Approximate Distinct Count | 1002 |
|---|---|
| Approximate Unique (%) | 3.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Memory Size | 508192 |
| Mean | 0.6463 |
| Minimum | 0 |
| Maximum | 0.9755 |
| Zeros | 2426 |
| Zeros (%) | 7.6% |
| Negatives | 0 |
| Negatives (%) | 0.0% |
| Minimum | 0 |
|---|---|
| 5-th Percentile | 0 |
| Q1 | 0.5848 |
| Median | 0.7226 |
| Q3 | 0.8075 |
| 95-th Percentile | 0.8794 |
| Maximum | 0.9755 |
| Range | 0.9755 |
| IQR | 0.2228 |
| Mean | 0.6463 |
|---|---|
| Standard Deviation | 0.2374 |
| Variance | 0.05634 |
| Sum | 20528.4568 |
| Skewness | -1.5777 |
| Kurtosis | 1.7954 |
| Coefficient of Variation | 0.3672 |
numerical
| Approximate Distinct Count | 18015 |
|---|---|
| Approximate Unique (%) | 56.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Memory Size | 508192 |
| Mean | 0.989 |
| Minimum | 0.9091 |
| Maximum | 0.9983 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negatives | 0 |
| Negatives (%) | 0.0% |
| Minimum | 0.9091 |
|---|---|
| 5-th Percentile | 0.9757 |
| Q1 | 0.9877 |
| Median | 0.9912 |
| Q3 | 0.9933 |
| 95-th Percentile | 0.9956 |
| Maximum | 0.9983 |
| Range | 0.08926 |
| IQR | 0.00563 |
| Mean | 0.989 |
|---|---|
| Standard Deviation | 0.007666 |
| Variance | 5.8773e-05 |
| Sum | 31414.0481 |
| Skewness | -3.6864 |
| Kurtosis | 22.4816 |
| Coefficient of Variation | 0.007751 |
numerical
| Approximate Distinct Count | 74 |
|---|---|
| Approximate Unique (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Memory Size | 508192 |
| Mean | 0.2894 |
| Minimum | 0 |
| Maximum | 0.74 |
| Zeros | 4888 |
| Zeros (%) | 15.4% |
| Negatives | 0 |
| Negatives (%) | 0.0% |
| Minimum | 0 |
|---|---|
| 5-th Percentile | 0 |
| Q1 | 0.15 |
| Median | 0.33 |
| Q3 | 0.42 |
| 95-th Percentile | 0.53 |
| Maximum | 0.74 |
| Range | 0.74 |
| IQR | 0.27 |
| Mean | 0.2894 |
|---|---|
| Standard Deviation | 0.1712 |
| Variance | 0.02931 |
| Sum | 9192.94 |
| Skewness | -0.4033 |
| Kurtosis | -0.9196 |
| Coefficient of Variation | 0.5915 |
numerical
| Approximate Distinct Count | 1780 |
|---|---|
| Approximate Unique (%) | 5.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Memory Size | 508192 |
| Mean | 0.4025 |
| Minimum | 0 |
| Maximum | 0.8957 |
| Zeros | 2428 |
| Zeros (%) | 7.6% |
| Negatives | 0 |
| Negatives (%) | 0.0% |
| Minimum | 0 |
|---|---|
| 5-th Percentile | 0 |
| Q1 | 0.2857 |
| Median | 0.4103 |
| Q3 | 0.5333 |
| 95-th Percentile | 0.717 |
| Maximum | 0.8957 |
| Range | 0.8957 |
| IQR | 0.2476 |
| Mean | 0.4025 |
|---|---|
| Standard Deviation | 0.1943 |
| Variance | 0.03776 |
| Sum | 12784.2583 |
| Skewness | -0.2354 |
| Kurtosis | -0.2572 |
| Coefficient of Variation | 0.4828 |